Applying Co-training to Clickthrough Data for Search Engine Adaptation
نویسندگان
چکیده
The information on the World Wide Web is growing without bound. Users may have very diversified preferences in the pages they target through a search engine. It is therefore a challenging task to adapt a search engine to suit the needs of a particular community of users who share similar interests. In this paper, we propose a new algorithm, Ranking SVM in a Co-training Framework (RSCF). Essentially, the RSCF algorithm takes the clickthrough data containing the items in the search result that have been clicked on by a user as an input, and generates adaptive rankers as an output. By analyzing the clickthrough data, RSCF first categorizes the data as the labelled data set, which contains the items that have been scanned already, and the unlabelled data set, which contains the items that have not yet been scanned. The labelled data is then augmented with unlabelled data to obtain a larger data set for training the rankers. We demonstrate that the RSCF algorithm produces better ranking results than the standard Ranking SVM algorithm. Based on RSCF we develop a metasearch engine that comprises MSNSearch, Wisenut, and Overture, and carry out an online experiment to show that our metasearch engine outperforms Google.
منابع مشابه
Spying Out Accurate User Preferences for Search Engine Adaptation
Most existing search engines employ static ranking algorithms that do not adapt to the specific needs of users. Recently, some researchers have studied the use of clickthrough data to adapt a search engine’s ranking function. Clickthrough data indicate for each query the results that are clicked by users. As a kind of implicit relevance feedback information, clickthrough data can easily be coll...
متن کاملCross-Market Model Adaptation with Pairwise Preference Data for Web Search Ranking
Machine-learned ranking techniques automatically learn a complex document ranking function given training data. These techniques have demonstrated the effectiveness and flexibility required of a commercial web search. However, manually labeled training data (with multiple absolute grades) has become the bottleneck for training a quality ranking function, particularly for a new domain. In this p...
متن کاملAddressing Malicious Noise in Clickthrough Data
Clickthrough logs are becoming an increasingly used source of training data for learning ranking functions. Due to the large impact that the position in search results has on commercial websites, malicious noise is bound to appear in search engine click logs. We present preliminary work in addressing this form of noise, that we term click-spam. We analyze click-spam from a utility standpoint, a...
متن کاملOptimizing Web Search Using Spreading Activation on the Clickthrough Data
In this paper, we propose a mining algorithm to utilize the user clickthrough data to improve search performance. The algorithm first explores the relationship between queries and Web pages and mine out co-visiting relationship as the virtual link among the Web pages, and then Spreading Activation mechanism is used to perform the query-dependent search. Our approach could overcome the challenge...
متن کاملQuery Session Data vs. Clickthrough Data as Query Suggestion Resources
Query suggestion has become one of the most fundamental features of Web search engines. Some query suggestion algorithms utilize query session data, while others utilize clickthrough data. The objective of this study is to examine which of these two resources can provide more effective query suggestions. Our results show that query session data outperforms clickthrough data in terms of clickthr...
متن کامل